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2 SYSTEM AND METHOD FOR CONVERTING A STANDARD 

3 GENERALIZED MARKUP LANGUAGE IN MULTIPLE LANGUAGES 



4 The present invention generally relates to an improved method 

5 and system for converting text jfrom a Standard Generalized Markup Language 

6 ("SGML") file having a plurality of characters to another specified language. 

7 More specifically, it relates to an improved method and system for converting 

8 text from a Standard Generalized Markup Language file having a plurality of 

9 characters into another specified language using a CONVERSTR variable, a 

10 HTMLCODE variable and a VAL variable, wherein each character represents a 

1 1 tag or text, and each tag has a start and an end. 

12 As a result of the Internet, global communication has become 

13 commonplace for most business interactions. It is currently quite typical that a 

14 file is required to be available in multiple languages. This is especially useful 

15 for World Wide Web communications because the Intemet is used in different 
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1 countries. Thus, there is a clear need for files and web pages to be available in 

2 multiple languages. Put differently, files and web pages need to be adapted for 

3 use with an international market. 

4 Currently, most web pages that are available in different 

5 languages are done on a page-level basis, meaning each and every page is 

6 stored in different language versions. For example, in the context of the web, if 

7 a Hyper Text Markup Language ("HTML'') page is available in five languages, 

8 the HTML page in each language constitutes a separate file. So, because five 

9 different languages are available, there will be five files for the same HTML 

10 page. In other words, there is one file for each available language. 

1 1 One problem with this prior method is that it is necessary to code 

12 the pages for each language. Another problem is that because the same page 

13 must be kept in multiple files for different languages, the prior method 

14 inefficiently uses memory. This will become more acute as the use of Personal 

15 Digital Assistants ("PDAs") becomes more popular, since these portable 

16 computers tend to have less storage memory. In addition, the use of multiple 

17 files makes revisions to these files and pages very time consuming and error 

18 prone, since each file must be separately revised. Thus, there is a need for an 

19 improved method that can translate the web pages into different languages. 

20 BRIEF SUMMARY OF THE INVENTION 

21 The present invention is directed to an improved method and 

22 system for providing a file in multiple languages. More specifically, it relates 

23 to an improved method and system for converting text characters fi:om a 

24 Standard Generalized Markup Language file into another specified language 

25 using a CONVERSTR variable, a HTMLCODE variable and a VAL variable. 
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1 wherein each character represents a tag or text, and each tag has a start and an 

2 end. 

3 The present invention provides a method that includes the steps 

4 of (a) reading a character from the file, (b) determining whether the read 

5 character is the start of a tag, (c) adding the read character to the 

6 CONVERTSTR variable when the read character is not the start of a tag, (d) 

7 repeating steps (a), (b) and (c) for a next character until a read character is the 

8 start of a tag, (e) converting the CONVERTSTR variable into the specified 

9 language, and (f) adding the converted CONVERTSTR variable to the 

10 HTMLCODE variable. 

11 The present invention also provides a system that includes a 

12 HTMLCODE variable for defining the strings in the Standard Generalized 

13 Markup Language coding, a CONVERTSTR variable for defining the strings 

14 that are to be converted into the specified language, a VAL variable for 

15 defining the strings that have been converted into the specified language, and a 

16 translator for translating the strings in the CONVERTSTR variable into the 

1 7 specified language . 



1 8 DESCRIPTION OF THE DRAWINGS 

19 FIGURE 1 is a block diagram of a network system in which the 

20 present invention can be implemented; 

21 FIG. 2 is a flow chart illustrating the overall preferred 

22 functionality of the method of the present invention; 

23 FIG. 3 is a flow chart illustrating the preferred functionality of the 

24 parsing method of the present invention shown in FIG. 2; 

25 FIG. 4 illustrates the preferred code configuration of the preferred 

26 functionality of the parsing method; 
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1 FIG. 5 is an exemplary web content in a Hypertext Markup 

2 Language file that can be used with the present invention; and, 

3 FIG. 6 illustrates the Hypertext Markup Language file shown in 

4 FIG. 5 that is displayed in two different languages. 

5 GLOSSARY OF TERMS AND ACRONYMS 

6 The following terms and acronyms are used throughout the 

7 detailed description: 

8 Ghent- Server. A model of interaction in a distributed system in 

9 which a program at one site sends a request to a program at another site and 



10 waits for a response. The requesting program is called the "client," and the 

1 1 program which responds to the request is called the "server." In the context of 

12 the World Wide Web (discussed below), the client is a "Web browser" (or 

13 simply "browser") which runs on the computer of a user; the program which 

14 responds to browser requests by serving Web pages, or other types of Web 

1 5 content, is commonly referred to as a "Web server." 

16 Content. A set of executable instructions that is served by a 

17 server to a client and that is intended to be executed by the client so as to 

18 provide the client with certain functionality. Web content refers to content that 

19 is meant to be executed by operation of a Web browser. Web content, 

20 therefore, may non-exhaustively include one or more of the following: HTML 

21 code, SGML code, XML code, XSL code, CSS code, Java applet, JavaScript 

22 and C-"Sharp" code. 



23 CONVERTSTR variable. A variable of the present invention that 

24 defines the strings that are to be converted into a specified language. 

25 HTMLCODE variable. A variable of the present invention that 

26 defines the strings in the Standard Generalized Markup Language coding. 
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1 Hyper Text Markup Language ("HTML"). A standard coding 

2 convention and set of codes for attaching presentation and linking attributes to 

3 informational content within documents. During a document authoring stage, 

4 the HTML codes (referred to as "tags") are embedded within the informational 

5 content of the document. When the Web document (or HTML document) is 

6 subsequently transferred from a Web server to a browser, the codes are 

7 interpreted by the browser and used to display the document. Additionally, in 

8 specifying how the Web browser is to display the document, HTML tags can 

9 be used to create links to other Web documents (commonly referred to as 

10 "hyperlinks"). For more information on HTML, see Ian S. Graham, The HTML 

1 1 Source Book, John Wiley and Sons, Inc., 1995 (ISBN 0471-1 1894-4). 

12 Hyper Text Transport Protocol ("HTTP"). The standard World 

13 Wide Web client-server protocol used for the exchange of information (such as 

14 HTML documents, and client requests for such documents) between a browser 

15 and a Web server. HTTP includes a number of different types of requests, 

16 which can be sent from the client to the server to request different types of 

17 server actions. For example, a "GET" request, which has the format GET 

18 <URL>, causes the server to retum the document or file located at the specified 

19 URL. 

20 Hyperlink. A navigational link fi-om one document to another, 

21 from one portion (or component) of a document to another, or to a Web 

22 resource, such as a Java applet. Typically, a hyperlink is displayed as a 

23 highlighted word or phrase that can be selected by clicking on it using a mouse 

24 to jump to the associated document or document portion or to retrieve a 

25 particular resource. 
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1 Hypertext System. A computer-based informational system in 

2 which documents (and possibly other types of data entities) are linked together 

3 via hyperlinks to form a user-navigable "web." 

4 Internet. A collection of interconnected or disconnected networks 



5 (public and/or private) that are linked together by a set of standard protocols 

6 (such as TCP/IP and HTTP) to form a global, distributed network. (While this 

7 term is intended to refer to what is now commonly known as the Internet, it is 

8 also intended to encompass variations which may be made in the future, 

9 including changes and additions to existing standard protocols.) 



10 Parser. An algorithm or program to determine the syntactic 

11 structure of a sentence or string of symbols in some language. A parser 

12 normally takes as input a sequence of tokens output by a lexical analyzer. It 

1 3 may produce some kind of abstract syntax tree as output. 

14 Personal Digital Assistant (PDA). A small hand-held computer 

15 used to write notes, track appointments, manage email and browse the web, 

1 6 etc., generally with far less storage capacity than a desktop computer. 

17 Plug-In. A file containing data used to alter, enhance, or extend 

1 8 the operation of a parent application program. For example, a browser plug-in 

19 is a file that is configured to alter, enhance or extend the operations of a web 

20 browser. 

21 String. A string is a sequence of data values, usually bytes, 

22 which usually stand for characters (a "character string"). The mapping between 

23 values and characters is determined by the character set which is itself specified 

24 implicitly or explicitly by the enviromnent in which the string is being 

25 interpreted. 

26 Tag. An SGML, HTML, or XML token representing the 

27 beginning (start tag: "<p ...>") or end (end tag: "</p>") of an element. In 
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1 normal SGML syntax (and always in XML), a tag starts with a and ends 

2 with an In HTML jargon, the term "tag" is often used for an "element". 

3 Text Characters. Text characters represent the actual text that are 

4 to be displayed to the users within a SGML file. In contrast, the tags in the 

5 SGML file generally defines the formats of the text characters (e.g., 

6 <bold></bold> ). 

7 URL (Uniform Resource Locator). A unique address which fully 

8 specifies the location of a file or other resource on the Internet or a network. 

9 The general format of a URL is protocol: //machine address:port/path/filename. 

10 VAL variable. A variable of the present invention that defines 

1 1 the strings that have been converted into the specified language. 

12 Web Browser. A browser for browsing the World-Wide Web. 

13 Currently, the two main standard browsers are Microsoft Intemet Explorer™ 

14 from Microsoft Corporation and Netscape Navigator™ from Netscape 

15 Communications Corporation. 

16 Web Site. A computer system that serves informational content 



17 over a network using the standard protocols of the World Wide Web. 

1 8 Typically, a Web site corresponds to a particular Intemet domain name, such as 

19 "HP.com," and includes the content associated with a particular organization. 

20 As used herein, the term is generally intended to encompass both (i) the 

21 hardware/software server components that serve the informational content over 

22 the network, and (ii) the "back end" hardware/software components, including 

23 any non-standard or specialized components, that interact with the server 

24 components to perform services for Web site users. More importantly, a Web 

25 Site can have additional functionality, for example, a Web site may have the 

26 ability to print documents, scan documents, etc. 
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1 World Wide Web ("Web"). Used herein to refer generally to both 

2 (i) a distributed collection of interlinked, user-viewable hypertext documents 

3 (commonly referred to as Web documents or Web pages) that are accessible via 

4 the Internet, and (ii) the client and server software components which provide 

5 user access to such documents using standardized Internet protocols. Currently, 

6 the primary standard protocol for allowing applications to locate and acquire 

7 Web documents is HTTP, and the Web pages are encoded using HTML. 

8 However, the terms "Web" and "World Wide Web" are intended to encompass 

9 future markup languages and transport protocols which may be used in place of 
10 (or in addition to) HTML and HTTP. 



1 1 DETAILED DESCRIPTION 

12 Broadly stated, the present invention is directed to a method and 

13 system for converting text from a SGML file having a plurality of characters 

14 into another specified language, wherein each character represents a tag or text. 

15 Each tag has a start and an end. The method and system provides a way to 

16 translate a file in multiple languages using a CONVERSTR variable, a 

17 HTMLCODE variable and a VAL variable. As a result, multiple languages can 

18 be displayed with the use of a single file using the present invention. 

19 The present invention first reads a character from the SGML file, 

20 and determines whether the read character is the start of a tag. The read 

21 character is added to the CONVERTSTR variable when it is determined that 

22 the read character is not the start of a tag, and the process is repeated for each 

23 successive character until a read character is the start of a tag. Upon finding a 

24 read character that is the start of a tag, the CONVERTSTR variable is 

25 converted into the specified language and added to the HTMLCODE variable. 
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1 The network system in which the present invention can be 

2 implemented is shown in FIG. 1, and indicated generally at 10. Two client 

3 computers 12 are connected to a server computer 14 configured for the present 

4 invention via a network 16. Although the Internet is the preferable network 

5 connection 16 because it provides a very flexible and universal system of 

6 communication, other networks, such as an intranet, are contemplated by the 

7 present invention as well. 

8 For example, a web-based implementation, although preferred, is 

9 not the only option available. The present invention can be configured and 

10 coded to work with different network or operation systems. In fact, the present 

1 1 invention can be implemented without a network system at all. It can also be 

12 implemented with the use of a storage medium, such as a CD Rom, or installed 

13 software, such as a browser plug- in, on a standalone computer (not shown). 

14 As a result of the many possible implementations for the present 



15 invention, an explanation of the current preferred embodiment of the network 

16 topology is given as an example. The complexity of the various available 

17 implementations is furthered by the use of different file formats that can change 

18 as a result, and the software or firmware needed to work with the given desired 

19 file formats. 



20 In trying to present a clearer description of the present invention, 

21 a web-based implementation will be used as an example. However, it should 

22 be understood that others skilled in the art can appreciate the implementations 

23 of the various systems and configurations, and these implementations are 

24 within the scope of the present invention. 

25 With a web-based implementation shown as an example, each 

26 client computer 12 includes a browser 18 for communicating. As shown, the 

27 server computer 14 also includes software 20 for translating HTML files 22 
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1 from any of the client computers 12 to translated HTML files 24, which is 

2 forwarded to the cUent computer that originated the translation request. It 

3 should be noted that although a HTML file is described as an example, any 

4 type of SGML files, such as XML, can be used with the present invention. 

5 Thus, it should be appreciated that other implementations with the use of other 

6 types of SGML files are contemplated, and are within the scope of the present 

7 invention. 

8 The preferred code configuration is shown in FIG. 2, and 

9 indicated generally at 40. Because the present invention can be implemented in 

10 numerous ways, the preferred code configuration is broken down in various 

11 variables and loops. For example, the name give to the variables can be 

12 changed, and the loops can also be altered sHghtly. However, a person of 

13 ordinary skill in the art would appreciate that various methods can be 

14 implemented following the general break down of the code configurations. 

15 Consequently, other similar code configurations and methods are contemplated 

16 and are within the scope of the present invention. 

17 In the preferred embodiment, there are three main variables, 

18 specifically the HTMLCODE, CONVERTSTR, and VAL variable. The 

19 HTMLCODE variable defines the strings in the Standard GeneraHzed Markup 

20 Language coding. The remaining variable relates to the conversions. The 

21 CONVERTSTR variable defines the strings that are to be converted into the 

22 specified language, and the VAL variable defines the strings that have been 

23 converted into the specified language. Although not shown, the present 

24 invention includes a translator for translating the strings in the CONVERTSTR 

25 variable into the specified language. The translator can be any already 

26 commercially available translator that can be configured to work with the 

27 present invention. The translator contemplated can be any general translator 
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1 that can translate phrases or sentences. In fact, most of these translator are 

2 available on the internet. The web site at http://www.web-a- 

3 dexxom/translate.htm provides a list of some of these translators that are freely 

4 available on the Internet. However, it should be noted that the present 

5 invention can be implemented with other available translators, and the various 

6 implementations with different translators are within the scope of the present 

7 invention. 

8 The variables are initialized by being set to empty (e.g., 

9 HTMLCODE - " CONVERTSTR = " VAL = " "). Since characters must 

10 be read from the HTML file, the code configuration includes a command to 

11 read the character (i.e., READ CHAR;). Overall, there are four important 

12 loops. The first loop runs through all the characters of the file, and converts the 

13 text characters. While it is not the end of the file (i.e., EOF), the 

14 CONVERTSTR and VAL variables are initialized. Another loop will simply 

15 move the HTML tags to the HTMLCODE variable. There is another loop that 

16 reads all the characters for placement in the CONVERTSTR variable until a 

17 read character is the start of the tags. The last loop provides that, if the 

18 CONVERTSTR variable is not empty (i.e., if CONVERTSTR ! = " "), ran the 

19 VAL variable, which is defined as the fixnction to translate the strings in the 

20 CONVERTSTR variable into the specified language. Then, the VAL variable 

21 is finally added to the HTMLCODE variable to convert the file back to HTML 

22 format after the translation is done. 

23 Turning to an important aspect of the present invention, a flow 

24 chart of the preferred functionality of a method is shown in FIG. 3, and 

25 indicated generally at 50. The method is initiated by a user, through a user 

26 interface, requesting a HTML file to be displayed in another specified language 

27 (block 52). The HTML file requested for the translation is first downloaded to 
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1 the cache memory (block 54), and the HTML file is processed through a 

2 HTML parser, which will initiate a parsing method (block 56) shown in FIG. 4. 

3 Turning to FIG. 4, a flow chart of the preferred functionality of 

4 the parsing method is shown, and indicated generally at 56. A HTMLCODE 

5 variable is first initialized (block 58). In practice, the HTMLCODE variable is 

6 initialized by defining it to empty (i.e., HTMLCODE = " ";). Then, a first 

7 character is read from the HTML file (i.e., read char;) (block 62). It is next 

8 determined whether the read character (e.g., first character) is at the end of the 

9 file (block 62). Since it is the first character, it is unlikely that it is at the end of 

10 file. However, if the read character is at the end of file (block 62), the parsing 

1 1 method will end and continue back to FIG. 3. Otherwise, the CONVERTSTR 

12 variable will be initialized (block 66) if the read character is not at the end of 

13 the file (block 62). Similarly, to initialize the CONVERTSTR variable in the 

14 preferred embodiment, it is defined as empty (i.e., CONVERTSTR = " ")■ 

15 From the read character, which is still the first character of the 

16 HTML file, it is determined whether the read character is the start of a tag (i.e., 

17 char = "<") (block 68). If the read character is not the start of a tag (block 68), 

18 the method will loop to a second subroutine, wherein the first step is to 

19 determine whether the read character is the start of a tag (block 70). However, 

20 for the case of the first character, the process will continue to the next step of 

21 determining whether the read character (e.g., the first character) is an end of a 

22 tag, (i.e., char = ">") (block 72) since the first character of the HTML file 

23 generally always begins with the start of a tag (i.e., "<"). If the read character 

24 is an end of a tag (block 72), the process will be forwarded to the next 

25 subroutine, specifically the step of determining whether the character is the 

26 start of a tag (block 70) . 
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1 However, in the case of the first character, because it cannot be 

2 the end of a tag (block 72), it is next determined whether the read character is 

3 at the end of the file (block 74), which it likely will not be. However, if it is at 

4 the end of the file (block 74), the read character will be forwarded to the second 

5 subroutine starting with the step of determining whether the character is the 

6 start of a tag (block 70). 

7 The read character, if it is not at the end of the file, is added to the 

8 HTMLCODE variable (i.e., HTMLCODE = HTMLCODE + char) (block 76), 

9 which is followed by the next character bemg read (block 78). This newly read 

10 next character is looped back to the step of determining whether the read 

1 1 character is the end of a tag (block 72). The subroutine is repeated until a 

12 character is either the end of a tag (i.e., char = ">") (block 72) or at the end of 

13 the file (block 74). The read character then continues to the second subroutine, 

14 which determines whether it is the start of a tag (i.e., char = "<") (block 70). 

15 From the second subroutine, if the read character is not the start 

16 of a tag (block 70), it is next determined whether the read character is at the 

17 end of the file (block 80). If the read character is not at the end of the file 

18 (block 80), the read character is added to the CONVERTSTR variable (i.e., 

19 CONVERTSTR = CONVERTSTR + CHAR) (block 82), which is followed by 

20 a reading of the next character (i.e., READ CHAR;) (block 84). For the next 

21 read character, the method loops back to the starting step of the second 

22 subroutine, specifically determining whether the character is the start of a tag 

23 (block 70). 

24 Each character is read and looped back until either a character is 

25 the start of a tag (block 70) or the character is at the end of the file (block 80). 

26 The method continues on to the next step of determining whether the 

27 CONVERTSTR variable is empty (i.e., CONVERTSTR = " "?) (block 86). If 
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1 it is empty (block 86), the method loops back to the step of deteraiining 

2 whether the read character is at the end of the file (block 62). Otherwise, the 

3 string in the CONVERTSTR is translated to the specified language, which is 

4 placed in a VAL variable (i.e., VAL = CONVERT(CONVERTSTR)) (block 

5 88). The converted string is added to the HTMLCODE variable (i.e., 

6 HTMLCODE = HTMLCODE + VAL) (block 90). At this point, the earlier 

7 read character that was either the start of a tag (block 70) or at the end of the 

8 file (block 80), is looped back to the step of determining whether the read 

9 character is at the end of the file (block 62), and the process starts again. 

10 After all the characters have been processed, or more specifically, 

1 1 after a read character is determined to be at the end of the file (block 62), the 

12 parsing method is then completed. The method continues back to FIG. 3, and it 

13 is then determined whether the parsing of the HTML file was successful (block 

14 92). If it was not successful (block 92), an error message is returned to the user 

15 (block 94) and the HTML file without the translation will be displayed to the 

16 user (block 96). On the other hand, if the parsing was successful (block 92), 

17 the translated HTML file containing the parsed codes will be saved to the cache 

1 8 memory (block 98) and displayed to the user (block 1 00). 

19 An exemplary web content in a HTML file is shown in FIG. 5, 

20 and a translated and an untranslated version of the web page for the HTML file 

21 is shown in FIG. 6. FIG. 5 illustrates a real simplified HTML file, titled 

22 '*test.htmr' with the text, "Hello World!" Notice that the text in the original 

23 HTML file (i.e., test.html) is in the English language, and that the present 

24 invention provides a way to parse through the test.html file and translate the 

25 text into the French language, which now reads "Bonjour Monde!" Also, 

26 please notice that both web pages bear the same file name, specifically 

27 "testhtml." 
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1 From the foregoing description, it should be understood that an 

2 improved system and method for providing a file in multiple languages has 

3 been shown and described, which has many desirable attributes and 

4 advantages. The system and method provides a way for a HTML file to be 

5 displayed in multiple languages. By the use of a parsing method, the present 

6 invention is able to isolate the text in the file for translations to other desired 

7 languages. Thus, multiple languages can be displayed with the use of a single 

8 HTML file. In addition, there is no specified modifications or coding required 

9 for displaying the same HTML file in other languages, and any updating of the 

1 0 file can be made more consistently and with less effort. 

1 1 While various embodiments of the present invention have been 

12 shown and described, it should be understood that other modifications, 

13 substitutions and alternatives are apparent to one of ordinary skill in the art. 

14 Such modifications, substitutions and ahematives can be made without 

15 departing from the spirit and scope of the invention, which should be 

1 6 determined fi-om the appended claims. 

17 Various features of the invention are set forth in the appended 

18 claims. 
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